Visual and Predictive Analytics on Singapore News: Experiments on GDELT, Wikipedia, and ^STI

نویسندگان

  • Clifton Phua
  • Yuzhang Feng
  • Junyao Ji
  • Timothy Soh
چکیده

The open-source Global Database of Events, Language, and Tone (GDELT) is the most comprehensive and updated Big Data source of important terms extracted from international news articles . We focus only on GDELT’s Singapore events to better understand the data quality of its news articles, accuracy of its term extraction, and potential for prediction. To test news completeness and validity, we visually compared GDELT (Singapore news articles’ terms from 1979 to 2013) to Wikipedia’s timeline of Singaporean history. To test term extraction accuracy, we visually compared GDELT (CAMEO codes and TABARI system of extraction from Singapore news articles’ text from April to December 2013) to SAS Text Miner’s term and topic extraction. To perform predictive analytics, we propose a novel feature engineering method to transform row-level GDELT from articles to a user-specified temporal resolution. For example, we apply a decision tree using daily counts of feature values from GDELT to predict Singapore stock market’s Straits Times Index (ˆSTI). Of practical interest from the above results is SAS Visual Analytics’ ability to highlight the various impacts of June 2013 Southeast Asian haze and December 2013 Little India riot on Singapore. Although Singapore is unique as a sovereign city-state, a leading financial centre, has strong international influence, and consists of a highly multi-cultural population, the visual and predictive analytics reported here are highly applicable to another country’s GDELT data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

Comparative Analysis of GDELT Data Using the News Site Contrast System

Abstract The News Site Contrast (NSContrast) system analyzes news articles retrieved from multiple news sites based on the concept of contrast set mining. It can extract terms that characterize different topics of interest across news sites, countries, and regions. In this study, we used NSContrast to analyze Global Database of Events, Language, and Tone (GDELT) data by comparing news articles ...

متن کامل

Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry

In this work, we compare GDELT and Event Registry, which monitor news articles worldwide and provide big data to researchers regarding scale, news sources, and news geography. We found significant differences in scale and news sources, but surprisingly, we observed high similarity in news geography between the two datasets.

متن کامل

An Exploration of Cursor tracking Data

Cursor tracking data contains information about website visitors which may provide new ways to understand visitors and their needs. This paper presents an Amazon Mechanical Turk study where participants were tracked as they used modified variants of the Wikipedia and BBC News websites. Participants were asked to complete reading and information-finding tasks. The results showed that it was poss...

متن کامل

Sensemaking on Wikipedia by Secondary School Students with SynerScope

Visual analytics of linked data can be done by secondary school students with minimal preparation. We study the learning curve of students while answering typical Web analytics questions on Wikipedia and DBpedia using SynerScope visual analytics software. We find that after a short tutorial students are able to answer most complex questions in a few minutes, learning by trial and error. Older s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1404.1996  شماره 

صفحات  -

تاریخ انتشار 2014